
Europaisches 
Patentatnt 



European 
Patent Office 



PCT / IR fl 3 / Q S 46 5 



Office europeen 
des brevets 



27 MQV 2003 



Bescheinigung Certificate 




Die angehefteten Unterla- 
gen stimmen mit der 
urspranglich eingereichten 
Fassung der auf dem naxh- 
sten Blatt bezeichneten 
europaischen Patentanmel- 
dung Q herein. 



The attached documents 
are exact copies of the 
European patent application 
described on the following 
page, as originally filed 



Les documents fixes a 
cette attestation sont 
conformes a la version 
initialement deposee de 
la demande de brevet 
europeen specifiee a la 
page suivante. 



Patentanmeldung Nr. Patent application No. Demande de brevet n° 

02292994.7 



EPA/EPO/OEB Form 1014.1 - 02.2000 7001014 



Der President des Europaischen Patentamts; 
Im Auftrag 

For the President of the European Patent Office 

Le President de TOffice europeen des brevets 
p.o. 



R C van Dijk 

PRIORITY 

DOCUMENT 

SUBMITTED OR TRANSMITTED IN 
COMPLIANCE WITH RULE 17.1(a) OR (b) 



BEST AVAILABLE COPY 



J) 



Europdisches 
Paten tamt 



European 
Patent Office 



Office europeen 
des brevets 



Anmeldung Nr: 
Application no.: 
Decnande no: 



02292994.7 



Anmeldetag: 

Date of filing: 04. 12.02 
Date de depdt: 



Anmel der/AppI 1cant( s)/Demandeur( s): 

Konlnklijke Philips Electronics N.V. 
Groenewoudseweg 1 
5621 BA Eindhoven 
PAYS-BAS 



Bezelchnung der Erf 1ndung/Tltle of the 1nvent1on/Tltre de 1 1 Invention: 
(Falls die Bezelchnung der Erflndung nlcht angegeben 1st, slehe Beschrelbung. 
If no title 1s shown please refer to the description. 
SI aucun tltre n'est 1nd1qu<5 se referer a la description.) 

Video coding method and device 



In Anspruch genommene PrlorlHt(en) / Pr1or1ty(1es) claimed /Pr1or1t6(s) 
revendlquee(s) 

Staat/Tag/Aktenzelchen/State/Date/Flle no./Pays/Date/Numero de depdt: 



Internationale Patentklasslf Ikatlon/Inter national Patent Classification/ 
Classification Internationale des brevets: 

H04N7/26 

Am Anmeldetag benannte Vertragstaaten/Contractlng states designated at date of 
flllng/Etats contractants designees lors du depdt: 

AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LI LU MC NL PT SE SI SK 



02292994.7 
EPA/EP0/0EB Form 1014.2 - 01.2000 



7001014 



2 



1 

"VIDEO CODING METHOD AND DEVICE" 



PHFR020135 EPp 



FIELD OF THE INVENTION 

The present invention relates to the field of video compression and, more 
particularly, to a three-dimensional (3D) video coding method for the compression of a 
bftstream corresponding to an original video sequence that has been divided Into successive 
groups of frames (GOFs) the size of which is N = 2" with n being an Integer, said coding 
method comprising the following steps, applied to each successive GOF of the sequence : 

a) a spatio-temporal analysis step, leading to a spatio-temporal multiresolution 
decomposition of the current GOF into low and high frequency temporal subtends, said step 
itself comprising : 

- a motion estimation sub-step ; 

- based on said motion estimation, a motion compensated temporal filtering sub- 
step, performed on each of the 2"" 1 couples of frames of the current GOF ; 

- a spatial analysis sub-step, performed on the subtends resulting from said 
temporal filtering sub-step ; 

b) an encoding step, said step itself comprising : 

- an entropy coding sub-step, performed on said low and high frequency temporal 
subtends resulting from the spatio-temporal analysis step and on motion vectors obtained by 
means of said motion estimation step ; 

- an arithmetic coding sub-step, applied to the coded sequence thus obtained and 
delivering an embedded coded bitstream. 

The invention also relates to a corresponding coding device, for the implementation 
of said coding method. 

BACKGROUND OF THE INVENTION 

From MPEG-1 to H.26L, standard video compression schemes were based on so- 
called hybrid solutions : an hybrid video encoder uses a predictive scheme where each current 
frame of the input video sequence Is temporally predicted from a given reference frame, and 
the prediction error thus obtained by difference between said current frame and its prediction is 
spatially transformed (the transform is for instance a bMImensfonal DCT transform) in order to 
get advantage of spatial redundancies. A more recent approach, called 3D (or 2D+t> subtend 
analysis, has then consisted In processing a group of frames (GOF) as a three-dimensional 
structure and spatJo-temporally filtering It in order to compact the energy in the low 
frequencies. 

The introduction of a motion compensation step in this 3D subtend decomposition 
scheme allows to improve the overall coding efficiency and leads to a spatio-temporal 
multiresolution (hierarchical) representation of the video signal thanks to a subtend tree. As 
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depicted for Instance In Rg.l showing such a 3D wavelet decomposition with motion 
compensation, each GOF of the input video sequence, including in the illustrated case eight 
frames Fl to F8, is first motioii-a>mpensated (NIC) in order to process sequences with large 
motion, and then temporally filtered (TP) using Hear wavelets (me dotted arrows correspond to 
a high-pass temporal filtering, while the other ones correspond to a low-pass temporal filtering). 
Three stages of decomposition are shown (L and H = first stage ; LL and LH = second stage ; 
LLL and LLH = third stage), a group of motion vector fields (respectively MV4, MV3, MV2) being 
generated at each temporal decomposition level. The high frequency temporal subbands of 
each level (H, LH and LLH In the above example) and the low frequency temporal subband(s) of 
the deepest one (LLL) are then spatially analyzed through a wavelet filter, and an entropy 
encoder allows to encode the wavelet coefficients resulting from this spatio-temporal 
decomposition. Ail these operations are similarly applied to the successive GOFs of the Input 

video sequence. , ... 

Among the different entropy coding techniques that can be used to encode the 3D 
wavelet coefficients resulting from this subband decomposition, the so-called 3D-SPIHT 
algorithm, described for example in the document "Low bit-rate scalable video coding with 3D 
set partitioning In hierarchical trees (3D-SPIHT)", KJZ JOong and W.A.Peartman, IEEE 
Transactions on Circuits and Systems for Video Technology, vol.10, n°8, December 2000, 
pp 1374-1387, is one of the most efficient ones (and also its extension to support scalability, 
described in "A fully scalable 3D subband video codec," V.Bottreau, M.Benetiere, B.Pesquet- 
Popescu and B.Felts, Proceedings of IEEE International Conference on Image Processing, IQP 
2001, vol.2, pp.1017-1020, Thessalonlki, Greece, October 7-10, 2001). 

This 3D-SPIHT algorithm, presented in Fig.2 that illustrates the parent- 
offspring dependencies observed-ln the spatio-temporal orientation trees resulting from 
the subband decomposition (the notations In Rg.2 are the following : TF = temporal 
frame, TAS = temporal approximation subbands LL, CFTS = coefficients In the spatio- 
temporal approximation subbands (or root coefficients), TDS.LRL = temporal detail 
subbands LH at the last resolution level of the decomposition, and TDS.HR = temporal 
detail subbands H at higher resolution), Is based on a key concept : the prediction of the 
. absence of significant information across successive scales of the wavelet decomposition, 
• by exploiting the self-similarity inherent to natural images (i.e. if a coefficient is 
. insignificant according to a given criterion at the lowest scale of the decomposition, the 
coefficients corresponding to the same area at the other scales of said decomposition 
have a high probability to be insignificant as well). The 3D-SPIHT algorithm uses a tree 
structure - the spatio-temporal orientation tree - that naturally defines the spatial and 
temporal relationships Inside the hierarchical pyramid of the wavelet coefficients (the 
roots of the trees are composed of the pixels of the approximation subband - or root 
subband - at the lowest resolution, and the direct descendants - or offspring - of a mode 
correspond to the pixels of the same volume and direction in the next finer level of the 
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pyramid), and looks for zerotrees in the wavelets subtends fn order to reduce 
redundancies between them. The wavelet coefficients are finally encoded according to 
their nature : root of a possible zero-tree (or insignificant set), insignificant pixel, and 
significant pixel. 

In the literature, when the 3D-SPIHT is used, the temporal decomposition is 
stopped (see Hg.3, to be compared to the case of a complete decomposition illustrated in Rg.l) 
before the final (potential) decomposition step that would lead to a single tow-frequency 
temporal subband. Indeed the first temporal dependencies between wavelet coefficients are 
app!ied>between the two approximation subtends LL The meaning of these coefficients is * 
coherent since they are approximation wavelet coefficients at the same decomposition level, 
but said coefficients are highly deoorrelated because they contain Information from very 
different parts of the sequence (LLO is indeed computed from the four first input frames of the 
GOF and LL1 from the four last frames of the same GOF). 

•* «... • 

SUMMARY OF THE INVENTION 

It is then an object bf the invention to propose more efficient coding method with 
which the dependencies at this deep temporal decomposition level, which do not play a major 
role in the efficiency of the SPDTT approach (the benefit of exploiting inter-subband correlation 
appears especially in the first steps of the decomposition), are removed. 

i: To this end, the invention relates to a coding method such as defined in the 
introductory part of the description and which is moreover characterized in that, when said . 
temporal filtering sub-step comprises (n-1) levels, the final temporal decomposition level that 
would have led to a single low-frequency subband being omitted, the spatio-temporal analysis 
and encoding steps are performed according to the following rules : 

(a) each current input GOF Is splltted into two new GOFs with half the original size, 
said new GOFs being independent and comprising respectively the 2* 1 first frames and the 2 M . 
last ones of said input GOF ; • - 

(b) in each of these two new GOFs, a complete temporal decomposition with (n-1) 
levels ^performed down to the last low frequency temporal subband in order to get only one 
final approximation subband for each of said new GOFs ; 

(c) a modified 3D-PIHT scanning is applied consecutively and independently on 
these two new GOFs, the spatio-temporal orientation trees iced by said SPIHT scanning for 
defining the spatio-temporal relationships inside the hierarchical pyramid of the wavelet 
coefficients including now half the original number of subtends with respect to a (n-1) level 
temporal decomposition conventionally performed on the original GOF. 

The invention also relates to a coding device allowing to carry out said coding 

method. 
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BRIEF DESCRIPTION OF DRAWINGS 

The present invention will now be described, by way of example, with reference to 
the accompanying drawings in which : 

- Fig.l shows a 3D wavelet decomposition with motion compensation, applied to a 

GOF of the input video sequence ; 

- Rg.2 shows the parent-offspring dependencies observed in the spatio-temporal 

orientation trees resulting from said subband decomposition ; 

- Fig.3 illustrates the case of an uncompleted temporal multi resolution analysis with 
motion compensation as performed In previous solutions applying the 3D-SPIHT algorithm, said 
decomposition being stopped before the final decomposition step that leads to a single low- 
frequency temporal subband ; 

- FJg.4 illustrates a temporal decomposition performed in accordance with the 

principle of the invention ; 

- Rg.5 shows the new parent-offspring dependencies observed in the spatio- 
temporal orientation trees when performing the temporal decomposition in accordance with said 
principle of the invention. «. * •■ 

DETAILED DESCRIPTION OF THE INVENTION 

In order to remove dependencies between.the two approximation subbands LLO 
and LL1 of the uncompleted temporal decomposition of Fig.3, it Is first proposed to split the 
. current input GOF into two separate new GOFs with half the original size. A temporal 
decomposition is then performed for each separate GOF, said temporal decomposition being 
complete 0-e. performed down to the last low temporal subband) In order to get only one final 
approximation subband for each new GOF. 

This new temporal decomposition is Illustrated in Rg.4, in which the vertical dashed 
line shows the new natural separation for the GOF structure. Each new GOF can be considered 
as independent and all the Information corresponding to these two GOFs is transmitted 
independently. All the information of "GOF 0" is transmitted first (motion vectors and 
subbands), the natural order for the subband transmission being LLO, LHO, HO and finally HI, 
and all the Information of °G0Fl o Is then transmitted, the natural order for the subband 
transmission being 111, UH1, H2 and finally H3. 

Starting from this new temporal decomposition, it Is also proposed to modify the 
original SPIHT scanning of Rg.2, in order to discard dependencies between subbands from 
different GOFs. This new scanning Is applied consecutively on the two new GOFs, and a 
different set of parent-offspring dependencies, shown m Rg.5, is used to remove the 
dependencies between the two approximation subbands LLO and 111, and therefore the 
dependencies between the two new GOFs. 

The technical solution thus proposed halves the number of frames per GOF for a 
given number of decomposition levels. This can be considered as a major improvement when 
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compared to the original solution, because it halves the memory requirement both at the 
encoding side and at the decoding side. Moreover, this approach does not bring any penalty to 
the coding efficiency, since the modified dependencies only affect the temporal approximation 
subbands that can be considered as uncorrected. 

It may be noted that the new SPIHT scanning illustrated In Rg.5 could be 
associated successfully with the original 60F size of Flg.l : in that case, the subband 
transmission can be interleaved In order to send most Important information first (the 
transmission order would then be the original transmission order : LLO, LLL, LHO, LH1, HO, HI, 
H2, H3). Nevertheless, even though the dependencies between the approximation subbands 
have been removed, the GOF size is the original GOF size and the benefit in terms of memory 
requirements is lost 



6 

PHFR020135 EPp 

CLAIMS: 

1. A three-dimensional (3D) video coding method for the compression of a bitstream 
corresponding to an original video sequence that has been divided into successive groups of 
frames (GOFs) the size of which is N = 2 n with n being an integer, said coding method 
comprising the following steps, applied to each successive GOF of the sequenoe : 

a) a spatio-temporal analysis step, leading to a spatio-temporal multiresolution 
decomposition of the current GOF into low and high frequency temporal subbands, said step 
itself comprising : 

- a motion estimation sub-step ; 

- based on said motion estimation, a motion compensated temporal filtering sub- 
step, performed on each of the 2 0 * 1 couples of frames of the current GOF ; 

- a spatial analysis sub-step, performed on the subbands resulting from said 
temporal filtering sub-step ; 

b) an encoding step, said step itself comprising : 

- an entropy coding sub-step, performed on said low and high frequency temporal 
subbands resulting from the spatio-temporal analysis step and on motion vectors obtained by 
means of said motion estimation step ; 

- an arithmetic coding sub-step, applied to the coded sequence thus obtained and 
delivering an embedded coded bitstream ; 

said coding method being further characterized in that, when said temporal filtering sub-step 
comprises (n-1) levels, the final temporal decomposition level that would have led to a single 
low-frequency subband being omitted, the spatio-temporal analysis and encoding steps are 
performed according to the following rules : 

(a) each current input GOF is splitted into two new GOFs with half the original size, 
said new GOFs being independent and comprising respectively the 2"" 1 first frames and the 2** 1 
last ones of said Input GOF ; 

(b) in each of these two new GOFs, a complete temporal decomposition with (n-1) 
levels is performed down to the last low frequency temporal subband in order to get only one* 
final approximation subband for each of said new GOFs ; 

(c) a modified 3D-SPIHT scanning is applied consecutively and independently on 
these two new GOFs, the spatio-temporal orientation trees used by said SPIKT scanning for 
defining the spatio-temporal relationships inside the hierarchical pyramid of the wavelet 
coefficients including now half the original number of subbands with respect to a (n-1) - level 
temporal decomposition conventionally performed on the original GOF. 

2. A video coding device for the implementation of the three-dimensional video coding 
method according to dalm 1. 
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ABSTRACT : 

The invention relates to a three-dimensional (3D) video coding method for the 
compression of a bitstream corresponding to an original video sequence that has been divided 
into successive groups of N « 2 n frames (GOFs), comprising the following steps, applied to each 
successive GOF : 

a) a spatio-temporal analysis step, leading to a spatio-temporal multiresoliitfon 
decomposition of the current GOF into low and high frequency temporal subbands, said step 
itself comprising a motion estimation sub-step, a motion compensated temporal filtering sub- 
step, performed on each of the 2 n " 1 couples of frames of the current GOF, and a spatial analysis 
sub-step, performed on the subbands resulting from said temporal filtering sub-step ; 

b) an encoding step, comprising entropy and arithmetic coding sub-steps. 
According to the invention, when said temporal filtering sub-step comprises (n-1) levels, the 
final temporal decomposition level that would have led to a single low-frequency subband being 
omitted, the spatio-temporal analysis and encoding steps are performed according to the 
following rules.: (a) each current input GOF is splitted into two new GOFs with half the original 
size ; (b) in each of these two new GOFs, a complete temporal decomposition with (n-1) levels 
is performed down to the last low frequency temporal subband; (c) a modified 3D-SPIHT 
scanning is applied consecutively and independently on these two new GOFs. 
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